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1 .  Introduction 

Scientific  discovery  is  a  complex,  ill-defined  activity,  and  one  of  the  most  profitable  ways  to 
study  such  phenomena  is  to  construct  intelligent  programs  that  model  them.  In  this  paper  we 
describe  bacon.5,  a  program  that  discovers  empirical  laws  for  summarizing  data.  The  core  of  this 
system  is  a  set  of  general,  data-driven  heuristics  for  detecting  numerical  relations  and  proposing  new 
terms  to  express  these  relations.  However,  the  program  also  incorporates  expectation-driven  rules 
that  let  it  take  advantage  of  its  earlier  discoveries.  Before  moving  on  to  describe  the  system  in  detail, 
we  should  first  review  some  of  the  earlier  Artificial  Intelligence  research  on  discovery,  and  outline  the 
scope  and  limitations  of  the  current  project. 

One  of  the  earliest  attempts  to  model  scientific  discovery  was  the  simulation  work  of  Gerwin  [1]. 
Gerwin  was  interested  in  how  humans  could  infer  numerical  laws  or  functions,  given  knowledge  of 
specific  data  points.  Of  course,  such  descriptive  discovery  is  only  one  part  of  the  total  scientific 
process.  In  order  to  understand  this  process,  he  gave  subjects  several  sets  of  data  and  asked  them  to 
find  the  relationships  which  best  summarized  each  data  set.  Using  the  verbal  protocols  collected  from 
this  task,  Gerwin  built  a  working  simulation  of  the  subjects’  behaviors.  The  model  first  attempted  to 
identify  a  general  pattern  in  the  data,  such  as  a  periodic  trend  with  increasing  amplitudes,  or  a 
monotonic  decreasing  trend.  A  class  of  functions  was  stored  with  each  pattern  the  program  could 
recognize;  once  a  class  was  hypothesized,  the  system  attempted  to  determine  the  specific  function 
responsible  for  the  data.  If  unexplained  variance  remained,  the  program  treated  the  differences 
between  the  observed  and  predicted  values  as  a  new  set  of  data.  This  procedure  was  used  to 
elaborate  the  hypothesis  until  no  pattern  could  be  found  in  the  residual  data.  The  program  also  had 
the  ability  to  backtrack  if  the  latest  addition  to  the  rule  failed  to  improve  predictions.  One  limitation  of 
Gerwin's  simulation  was  that  the  program  incorporated  specific  knowledge  about  the  shapes  of 
functions  within  a  specified  range.  Therefore,  these  functions  could  not  have  variable  parameters 
associated  with  them.  Even  though  Gerwin’s  model  could  only  solve  a  very  restricted  range  of 
problems,  it  was  an  important  step  in  understanding  the  discovery  process. 

Another  early  discovery  system  was  oendral  [2],  a  program  that  identified  organic  molecules 
from  mass  spectrograms  and  nuclear  magnetic  resonances.  The  system  identified  chemical 
structures  in  three  main  stages  -  planning,  generating  plausible  structures,  and  testing  those 
structures.  The  first  stage  used  patterns  in  the  data  to  infer  that  certain  familiar  molecules  were 
present.  Considering  these  molecules  as  units  drastically  reduced  the  number  of  structures 
produced  during  the  generation  stage.  This  second  phase  used  knowledge  of  valences,  chemical 
stability,  and  user-specified  constraints  to  generate  all  plausible  chemical  structures.  In  the  final 
testing  stage,  the  system  predicted  mass  spectrograms  for  each  of  these  structures,  which  were  then 
ranked  according  to  their  agreement  with  the  data.  Dendral  relied  on  considerable  domain-specific 
knowledge,  which  was  laboriously  acquired  through  interaction  with  human  experts  in  organic 
chemistry. 
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In  order  to  reduce  their  dependence  on  human  experts,  the  same  researchers  designed 
meta-dendral  [3],  a  system  that  acquired  knowledge  of  mass  spectroscopy  which  could  then  be 
used  by  the  dendral  program.  Meta-dendral  was  provided  with  known  organic  compounds  and 
their  associated  mass  spectrograms,  from  which  it  formulated  rules  to  explain  these  data.  Two  types 
of  events  were  used  to  explain  spectrograms  -  cleavages  in  the  bonds  of  a  molecule  and 
migrations  of  atoms  from  one  site  to  another.  Although  plausible  actions  were  determined  using 
domain-specific  chemical  knowledge,  the  conditions  on  rules  were  found  through  a  much  more 
general  technique  [4].  Meta-dendral  has  successfully  discovered  new  rules  of  mass  spectroscopy 
for  three  related  families  of  organic  molecules. 

Lenat  (5]  has  described  am,  a  system  that  has  rediscovered  important  concepts  from  number 
theory.  The  program  began  with  some  100  basic  concepts  such  as  sets ,  lists,  equality,  and 
operations,  along  with  some  250  heuristics  to  direct  the  discovery  process.  These  heuristics  were 
responsible  for  filling  the  facets  of  concepts,  suggesting  new  tasks,  and  creating  new  concepts  based 
on  existing  ones.  New  tasks  were  ordered  according  to  their  Interestingness,  with  tasks  proposed 
by  a  number  of  different  heuristics  tending  to  be  more  intersting  than  those  proposed  by  a  single  rule. 
Using  this  measure  to  direct  its  search  through  the  space  of  mathematical  concepts,  am  defined 
concepts  for  the  integers,  multiplication,  divisors-of,  prime  numbers,  and  the  unique  factorization 
theorem.  Like  meta-dendral,  Lenat’s  system  incorporated  some  very  general  strategies,  as  well  as 
some  domain-specific  knowledge  about  the  field  of  mathematics. 

In  our  work  on  bacon,  we  have  attempted  to  develop  a  general  purpose  descriptive  discovery 
system.  Rather  than  relying  on  domain-dependent  heuristics,  as  many  of  the  earlier  discovery 
systems  have  done,  bacon  incorporates  weak  yet  general  heuristics  that  can  be  applied  to  many 
different  domains.  The  current  version  addresses  only  the  descriptive  component  of  scientific 
discovery.  It  does  not  attempt  to  construct  explanations  of  phenomena,  such  as  the  atomic  theory  or 
the  kinetic  theory  of  gasses,  but  we  will  have  more  to  say  on  this  in  a  later  section.  Neither  is  the 
system  meant  to  replicate  the  historical  details  of  various  scientific  discoveries,  though  of  course  we 
find  those  details  interesting.  Instead,  it  is  intended  as  a  model  of  how  discoveries  might  occur  in 
these  domains. 

Descriptive  discovery  may  take  either  of  two  basic  forms:  one  may  start  from  the  data  and  use 
very  general  strategies  to  uncover  regularities  in  those  data;  or  one  may  bring  certain  expectations  to 
the  task  and  examine  the  data  to  see  if  they  match  those  expectations.  Earlier  versions  of  bacon  [6, 7, 
8]  relied  entirely  on  data-driven  discovery  methods.  The  current  version  takes  advantage  of  these 
heuristics,  but  also  incorporates  a  number  of  expectation-driven  discovery  techniques.  The  latter 
take  advantage  of  discoveries  that  have  already  been  made  to  direct  and  simplify  the  search  process 
in  new  situations.  We  have  chosen  to  organize  the  paper  around  the  system’s  discovery  methods. 
Since  the  expectation-driven  heuristics  work  with  the  results  of  the  data-driven  approaches,  we  will 
begin  by  focusing  on  the  data-driven  components  and  then  move  on  to  their  expectation-driven 
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counterparts.  Both  types  of  heuristics  are  implemented  as  condition-action  rules  in  Forgy's  [9]  OPS4 
production  system  formalism. 

2.  Discovering  Numeric  Relations 

Bacon.s’s  most  basic  heuristic  attempts  to  discover  polynomial  relations  between  two  variables 
that  take  on  numeric  values.  This  rule  computes  the  successive  derivatives  of  one  term  with  respect 
to  the  other,  until  it  arrives  at  a  set  of  constant  values.  The  level  of  the  constant  derivative  tells  bacon 
the  highest  power  necessary  in  the  polynomial  it  seeks,  while  the  constant  determines  the  coefficient 
of  this  term.  As  in  Gerwin’s  system,  this  component  is  subtracted  out,  and  the  technique  is  repeated 
on  residual  values.  This  process  continues  until  all  of  the  variance  has  been  accounted  for,  and  the 
program  has  determined  the  complete  functional  relation  between  the  two  variables. 


Table  1 .  Determining  the  coefficient  of  a  quadratic  term. 

As  an  example,  let  us  consider  bacon.5's  use  of  this  heuristic  to  discover  the  law  y  =  3x 2  + 
2x  +  1.  The  program  begins  by  examining  values  of  the  dependent  term  y  for  different  values  of  the 
independent  term  x,  as  shown  in  Table  1.  Since  y  is  not  constant,  the  system  computes  the  values  of 
y’,  the  first  derivative  with  respect  to  x.  In  the  table,  the  first  value  of  y’  is  (34  -  6)/(3  - 1)  =  14,  while 
the  second  value  is  (121  •  34)/(6  -3)  =  29.  Since  these  values  are  not  constant  either,  bacon 
examines  the  second  derivative  y”,  basing  its  computation  on  the  values  of  y’  and  x.  Thus,  the  first 
value  of  y”  is  (29  -  l4)/(6  -  1)  «  3,  while  the  second  is  (50  •  29)/(10  -  3)  =  3.  In  this  case,  the 
program  finds  the  constant  value  it  seeks;  this  tells  bacon  that  an  x2  term  is  present  in  the  final  law, 
and  that  its  coefficient  is  3. 

However,  more  remains  to  be  done  before  the  discovery  is  complete.  After  subtracting  out  the 
3X2  term,  bacon  attempts  to  relate  the  values  of  y  -  3x2  to  the  independent  term  x,  as  shown  in  Table 
2.  This  time  the  first  derivative  is  the  constant  2,  implying  that  an  x  term  with  a  coefficient  of  2  is  also 
present  in  the  final  law.  Subtracting  this  new  component  out  as  well,  the  constant  value  1 
immediately  results,  as  we  see  in  Table  3.  bacon.s  includes  this  value  as  the  final  term  in  the  law  it 
has  discovered,  y  *  3x2  +  2x  +  1,  which  completely  summarizes  the  original  set  of  observed  data. 


Table  2.  Determining  the  coefficient  of  a  linear  term. 

This  method  lets  bacon.5  discover  any  of  a  large  class  of  functions  that  can  be  expressed  as 
polynomials  with  integer  powers  and  real  coefficients.  In  cases  where  no  polynomial  can  be  found, 
the  system  considers  various  powers  of  the  dependent  term,  so  that  an  even  larger  set  of  relations 
can  be  discovered.  Thus,  bacon  can  uncover  relations  such  as  y2  =  6.7 lx3  +  4.2 3x  and  y'1  = 
3.5X2.  The  system  entertains  only  one  hypothesis  at  a  time,  and  since  simpler  relations  are 
considered  before  more  complex  ones,  they  are  preferred  if  they  are  found  to  hold. 


Table  3.  Determining  the  constant  term  in  an  equation. 

3.  Recursing  to  Higher  Levels  of  Description 

By  itself,  the  above  differencing  heuristic  can  discover  numeric  relations  between  two 
variables,  but  more  complex  relations  lie  beyond  its  scope.  In  order  to  find  laws  relating  many  terms, 
bacon.5  invokes  a  second  data-driven  heuristic  that  lets  it  summarize  regularities  at  different  levels 
of  description.  Upon  discovering  a  law  at  one  level,  this  method  stores  the  coefficients  from  that 
law  at  the  next  higher  level.  Once  enough  of  these  higher  level  values  have  been  gathered,  bacon 
attempts  to  relate  them  to  the  independent  term  that  was  varied  in  each  of  the  experiments.  The 
system  employs  the  same  differencing  technique  to  find  the  second  level  law  as  it  did  at  lower  levels. 
After  a  law  at  the  second  level  has  been  found,  the  program  recurses  to  still  higher  levels,  until  all  of 
the  data  have  been  summarized. 

Bacon.s’s  discovery  of  the  ideal  gas  law  provides  a  useful  example  of  this  strategy.  This  law 
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may  be  stated  as  PV  =  8.32N(T  -  273),  where  P  is  the  pressure  on  a  quantity  of  gas,  the  dependent 
term  V  is  the  volume  of  the  gas,  T  is  the  temperature  of  the  gas  in  degrees  Celsius,  and  N  is  the 
quantity  of  gas  in  moles.  In  uncovering  this  law,  bacon  first  finds  the  relation  V1  =  aP,  where  a  is  a 
parameter  that  varies  with  different  values  of  T  and  N.  Upon  comparing  the  values  of  a  and  T,  the 
system  finds  the  law  a'1  =  bT  +  c,  where  band  c  represent  second  level  parameters  that  potentially 
vary  with  N.  Finally,  the  program  finds  that  b  =  dN,  and  that  c  =  eN.  Substituting  these  relations  into 
the  first  law,  we  arrive  at  the  equation  V1  =  P(dNT  +  eN)'1.  Bacon.s  calculates  the  value  of  d  to  be 
8.32,  and  e  to  be  -2271.36.  When  the  factor  8.32  is  divided  out,  e  becomes  -273,  or  the  absolute 
zero  point  expressed  in  the  Celsius  scale.  Thus,  the  equation  is  equivalent  to  the  standard  form  of  the 
ideal  gas  lav/.  Table  4  summarizes  the  steps  taken  in  this  discovery,  comparing  bacon’s  version  of 
the  law  with  the  standard  version,  and  showing  the  independent  terms  held  constant  at  each  level  of 
description. 


BACON’S  VERSION 

STANDARD  VERSION 

CONSTANT  TERMS 

1/V  =  aP 

PV  =  k 

N.T 

1/V  =  P/(bT  +  c) 

PV  =  k(T-273) 

N 

1/V  =  P/(dNT  +  eN) 

PV  =  8.32N(T-273) 

Table  4.  Summary  of  ideal  das  law  discovery. 

Taken  together,  the  heuristics  for  relating  numeric  terms  and  recursing  to  higher  levels  give 
bacon.5  considerable  power.  Using  these  two  strategies,  the  system  has  successfully  rediscovered 
versions  of  Coulomb’s  law  of  electrical  attraction,  Kepler’s  third  law  of  planetary  motion,  and  Ohm’s 
law  for  electrical  circuits.  Table  5  presents  the  forms  of  these  laws,  along  with  that  for  the  ideal  gas 
law.  Variables  are  shown  in  upper  case,  while  coefficients  are  given  in  lower  case.  Superficially,  the 
equations  in  the  table  have  quite  different  forms,  yet  all  can  be  expressed  as  combinations  of  the 
polynomial  relations  for  which  bacon  searches. 


Ideal  gas  law 

PV  =  rNT 

Coulomb's  law 

F  =  aQ.jQg/D2 

Kepler's  third  law 

D3/P2  =  k 

Ohm’s  law 

v  =  rl  +  IL 

Table  5.  Numeric  laws  discovered  by  bacon.5. 


4.  Postulating  Intrinsic  Properties 

The  heuristics  we  have  discussed  so  far  are  fine  for  relating  numeric  terms,  but  they  are  of  little 
use  when  an  independent  term  takes  on  nominal  or  symbolic  values.  In  such  cases,  bacon.5  draws 
on  a  third  data-driven  heuristic  that  postulates  intrinsic  properties.  This  rule  associates  the  values 
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of  the  numeric  dependent  term  with  the  nominal  independent  values,  and  retrieves  them  in  later 
situations.  In  this  context,  bacon  moves  beyond  the  relatively  simple  process  of  curve  fitting,  and 
takes  on  some  features  of  explanatory  discovery. 

For  example,  consider  a  version  of  Ohm’s  experiment  in  which  the  oatteries  and  wires  take  on 
nominal  values,  so  that  one  can  distinguish  betv/een  them  but  measure  none  of  their  characteristics. 
Ohm’s  law  may  be  stated  as  /  =  V/R,  where  /  is  the  current  flowing  through  a  circuit,  Ws  the  voltage 
associated  with  a  wire,  and  R  is  the  resistance  of  the  wire.  (We  assume  here  that  the  internal 
resistance  is  negligible.)  -  Table  6  presents  data  that  might  be  gathered  in  an  experiment  with  three 
batteries  (A,  B,  and  C)  and  three  wires  (X,  V,  and  Z).  The  values  of  the  current  were  calculated  on  the 
assumption  that  VA  =  4.613,  VR  =  5.279,  Vc  =  7.382,  Rx  =  1.327,  Ry  =  0.946,  and  Vz  = 
1.508. 


BATTERY 

WIRE 

CURRENT 

CONDUCTANCE 

SLOPE 

A 

X 

3.4763 

3.4763 

1.0 

A 

Y 

4.8763 

4.8763 

1.0 

A 

Z 

3.0590 

3.0590 

1.0 

B 

X 

3.9781 

3.4763 

1.1444 

B 

Y 

5.5803 

4.8763 

1.1444 

B 

Z 

3.5007 

3.0590 

1.1444 

C 

X 

5.5629 

3.4763 

1.6003 

C 

Y 

7.8034 

4.8763 

1.6003 

c 

Z 

4.8952 

3.0590 

1.6003 

Table  6.  Postulating  the  property  of  conductance. 

Focusing  on  the  first  three  rows  of  this  table,  bacon.5  finds  that  with  the  battery  set  to  A  and 
varying  the  wire,  the  current  of  the  circuit  varies  as  well.  Since  it  cannot  apply  its  numeric  heuristic  in 
this  situation,  the  program  proposes  conductance  as  an  intrinsic  property  of  the  wire,  and  bases  the 
values  of  this  new  term  on  those  of  the  current.  Having  done  this,  bacon  can  apply  its  differencing 
heuristic,  and  finds  a  linear  relation  between  the  current  and  the  new  property,  with  a  slope  of  one.  Of 
course,  this  is  hardly  surprising,  since  the  conductance  was  defined  so  that  this  relation  would  hold. 

However,  upon  varying  the  values  of  the  battery,  bacon  retrieves  the  same  values  of  the 
conductance  in  the  new  situations,  as  shown  in  the  fourth  through  ninth  rows.  When  these  are 
compared  to  the  currents,  the  system  discovers  other  linear  relations  with  different  slopes.  After 
recursing  to  a  higher  level  of  description,  bacon  uses  these  new  parameters  to  postulate  an  intrinsic 
property  associated  with  the  battery,  which  we  would  call  the  voltage.  The  retrieval  technique  is 
actually  stated  as  a  separate  heuristic,  and  shows  more  similarity  to  the  expectation-driven  heuristics 
we  shall  discuss  later  than  to  the  data-driven  ones.  We  have  mentioned  it  here  because  the  data- 
dn  *en  process  of  postulating  an  intrinsic  property  has  little  purpose  without  the  ability  to  retrieve  the 
osociated  values  at  later  times. 
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Ohm's  law  v  =  rl 

Archimedes’ law  of  displacement  d  =  W/v 

The  law  of  definite  proportions  k  =  We/V/c 

Table  7.  Laws  discovered  with  intrinsic  properties. 

Unfortunately,  the  discovery  of  intrinsic  properties  is  more  complex  than  we  have  made  it 
appear.  Some  properties  exist  which  are  associated  not  with  one,  but  with  many,  nominal  terms.  An 
obvious  example  is  the  coefficient  of  friction,  which  is  a  function  of  pairs  of  surfaces.  To  avoid 
difficulties  in  such  cases,  bacon.5  takes  a  conservative  path  by  comparing  different  sets  of  intrinsic 
values.  If  a  linear  relation  is  found,  the  system  generalizes  and  retrieves  values  as  in  the  Ohm's  law 
example.  However,  if  no  relation  is  found,  it  retains  the  additional  conditions.  Table  7  lists  some  of 
the  lav/s  rediscovered  by  bacon.5  that  incorporate  intrinsic  properties.  These  include  a  version  of 
Archimedes'  law  of  displacement,  in  which  the  system  computes  the  volumes  of  irregular  solids  as 
well  as  their  density,  and  Proust’s  law  of  definite  proportions,  in  which  a  constant  weight  ratio  is 
associated  with  an  element-compound  pair. 

5.  Finding  Common  Divisors 

The  history  of  chemistry  from  1800  to  1860  provides  some  additional  examples  of  the  discovery 
of  intrinsic  properties,  with  an  interesting  complication.  In  1808,  John  Dalton  set  forth  the  law  of 
simple  proportions,  which  stated  that  when  two  elements  could  combine  to  form  different 
compounds,  the  weights  contributed  by  one  element  for  a  constant  weight  of  the  other  always 
occurred  in  small  integer  proportions  to  each  other.  In  1809,  Joseph  Gay-Lussac  found  evidence 
for  his  law  of  combining  volumes,  which  stated  that  a  similar  relation  held  for  the  relative  volumes 
contributed  by  gaseous  elements  in  chemical  reactions.  Again,  in  1815,  William  Prout  noted  that  the 
atomic  weights  of  the  known  elements  were  all  very  nearly  divisible  by  the  weight  of  hydrogen.  And 
finally,  in  1860,  Stanislao  Cannizzaro  pointed  out  that  when  a  given  element  took  part  in  different 
reactions,  the  ratios  of  the  element’s  weight  and  the  volume  of  the  resulting  compound  always 
occurred  in  small  integer  proportions. 

Bacon.5  incorporates  a  fourth  data-driven  heuristic  that  enables  it  to  discover  these  regularities 
in  the  chemical  data.  When  the  system  is  about  to  postulate  a  new  intrinsic  property,  this  rule 
examines  the  dependent  values  to  see  if  they  have  a  common  divisor.  If  none  can  be  found,  then  the 
process  continues  as  described  in  the  last  section.  However,  if  the  numbers  can  be  evenly  divided, 
then  the  resulting  integers  are  used  as  the  intrinsic  values  instead  of  the  original  numbers.  Also,  the 
common  divisor  is  associated  with  the  terms  that  were  held  constant,  instead  of  the  1.0  that  would 
normally  be  used.  This  means  that  even  in  cases  where  bacon.5  cannot  generalize  and  so  retrieve  a 
set  of  intrinsic  values  in  a  new  situation,  the  common  divisors  let  the  system  break  out  of  the 
tautological  circle  and  make  further  interesting  discoveries. 


ELEMENT 

COMPOUNO 

WE/Vc 

INTEGER 

DIVISOR 

HYDROGEN 

WATER 

0.0892 

2.0 

0.0446 

HYDROGEN 

AMMONIA 

0.1338 

3.0 

0.0446 

HYDROGEN 

ETHYLENE 

0.0892 

2.0 

0.0446 

OXYGEN 

n2o 

0.715 

1.0 

0.715 

OXYGEH 

so2 

1.430 

2.0 

0.715 

OXYGEN 

co2 

1.430 

2.0 

0.715 

NITROGEN 

n20 

1.250 

2.0 

0.625 

NITROGEN 

AMMONIA 

0.625 

1.0 

0.625 

NITROGEN 

no2 

0.625 

1.0 

0.625 

Table  8.  bacon. 5’s  rediscovery  of  Cannizzaro’s  law. 

Table  8  summarizes  bacon. s’s  reformulation  of  Cannizzaro's  discovery.  The  system  is  given 
control  over  two  independent  nominal  terms  -  one  of  the  elements  entering  into  a  reaction,  and  the 
resulting  compound.  The  dependent  variable  is  we/v&  or  the  weight  of  the  element  used  in  the 
reaction,  divided  by  the  volume  of  the  compound  that  results.  For  the  element  hydrogen,  different 
compounds  lead  to  different  values  of  we^vc<  so  the  system  postulates  an  intrinsic  property. 
However,  the  dependent  values  are  all  divisible  by  0.0446,  so  the  integers  2,  3,  and  2  are  used  as 
the  intrinsic  values  instead  of  the  originals.  This  process  is  repeated  with  the  elements  oxygen  and 
nitrogen,  but  in  these  cases  the  divisors  0.7 #5  and  0.625  are  found  instead.  The  integers  in  the 
table  correspond  to  the  coefficients  on  the  given  elements  in  the  balanced  equations  for  each 
reaction,  while  the  divisors  correspond  to  the  relative  atomic  weights  of  the  elements.  When  these 
divisors  are  carried  along  to  the  next  level  of  description,  bacon.s  also  notes  that  they  can  all  be 
divided  by  the  value  associated  with  hydrogerr,  this  statement  is  a  variant  on  Prout’s  hypothesis.  By 
searching  for  common  divisors,  bacon  has  replicated  some  of  the  major  empirical  discoveries  of 
nineteenth  century  chemistry. 

6.  Expecting  Similar  Relations 

We  have  now  completed  our  survey  of  bacon.s's  data-driven  heuristics.  The  remainder  of  the 
system’s  strategies  draw  upon  information  gathered  in  this  bottom-up  manner  to  reduce  search  at 
later  stages.  Thus,  when  we  speak  of  expectation-driven  heuristics,  we  do  not  mean  to  imply  that 
bacon  starts  with  knowledge  of  a  particular  domain.  Rather,  we  mean  that  the  program  is  capable  of 
taking  advantage  of  discoveries  it  has  made  at  early  stages  to  simplify  this  process  at  later  points. 

The  simplest  of  these  heuristics  proposes  that  if  bacon.s  has  found  a  law  in  one  context  (i.e., 
when  certain  variables  are  held  constant),  it  should  expect  a  similar  form  of  law  to  hold  in  a  new 
context  (i.e.,  when  those  terms  take  on  different  values).  For  example,  this  similar  relations 
heuristic  could  be  used  after  the  system  has  discovered  Kepler’s  third  law  for  the  planets  orbiting  the 
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sun,  to  predict  an  analogous  law  to  hold  for  the  moons  of  Jupiter.  Specifically,  if  the  law  D3  =  l.OP 2 
were  found  in  the  first  situation,  bacon. 5  expects  that  a  lav/  of  the  form  D3  =  kP2  would  hold  in  the 
new  case,  though  it  would  not  yet  know  the  value  of  the  parameter  k.  Such  a  prediction  allows  bacon 
to  replace  its  search  through  the  space  of  possible  relationships  between  two  variables  with  a  simple 
calculation  designed  to  test  the  expected  relationship.  If  this  relationship  holds,  bacon  calculates  the 
values  of  the  unknown  parameters  and  moves  on  to  further  discoveries. 

Previous  versions  of  bacon  always  utilized  the  same  number  of  observations  to  find 
relationships  between  variables  in  its  experiments.  However,  once  the  system  expects  a  particular 
form  of  a  law  to  hold,  it  can  determine  the  number  of  observations  necessary  to  estimate  the  desired 
parameters.  Using  this  data  reduction  heuristic,  bacon  only  collects  the  minimum  number  of 
observations  necessary  to  complete  its  description  of  the  current  law.  If  D  were  being  expressed  as  a 
function  of  P  in  the  above  example,  bacon.s  would  need  only  three  data  points  to  determine  the  value 
of  k  for  the  Jovian  moons.1 

Taken  together,  these  two  heuristics  significantly  reduce  the  program's  search  through  both 
the  space  of  data  and  the  space  of  rules.  The  actual  amount  of  savings  depends  on  the  number  of 
superfluous  data  points.  In  order  to  evaluate  the  impact  of  the  new  heuristics,  bacon  was  given  six 
values  of  each  independent  variable  in  four  separate  discovery  tasks.  Performance  of  the  purely 
data-driven  system  was  compared  to  systems  incorporating  the  expectation-driven  heuristics,  and  is 
shown  in  Table  9.  From  this  table,  it  can  be  seen  that  the  similar  relation  heuristic  only  resulted  in  a 
small  amount  of  savings.  This  result  is  somewhat  misleading,  because  the  amount  of  search  required 
by  the  differencing  technique  was  significantly  reduced;  however,  the  OPS4  interpreter  was  slowed  by 
the  inclusion  of  an  additional  condition-action  rule,  so  the  effect  was  masked.  For  more  complex 
forms  of  laws,  the  computational  savings  would  be  greater. 


DD 

DD  *SR 

DD  ♦  SR  ♦  DR 

IDEAL  GAS  LAW 

35 

34 

21 

COULOMB’S  LAW 

35 

35 

23 

OHM’S  LAW 

3 

3 

3 

KEPLER'S  THIRD  LAW 

3 

3 

3 

Table  9.  Time  to  discover  numeric  laws  in  CPU  seconds.2 

The  present  system  employs  a  few  simple  heuristics  for  dealing  with  noise.  In  executing  the 
differencing  technique,  bacon.s  checks  the  current  derivative  term  to  see  if  its  values  are  constant. 


^  course,  more  would  be  required  il  significant  noise  were  present,  but  the  principle  of  reduced  data  would  remain. 

DO  =  data-driven  heuristics,  DD  +  SR  =  data-driven  and  similar  relation  heuristics,  DD  +  SR  +  DR  =  data-driven,  similar 
relation,  and  data  reduction  heuristics. 


DATA-DRIVEN  AMD  EXPECTATION-DRIVEN  DISCOVERY 


PAGE  10 


All  values  which  fall  within  a  small  interval  of  one  another  are  accepted  as  equivalent.  The  program 
also  calculates  the  number  of  outliers,  or  exceptions  to  the  current  relationship.  If  the  number  of 
exceptions  is  a  small  proportion  of  the  total  number  of  data  points,  bacon .5  decides  the  current  term 
is  constant,  and  updates  its  functional  description.  Although  these  methods  allow  bacon.5  to  cope 
with  modest  amounts  of  noise,  more  sophisticated  techniques  might  be  required  to  deal  with  very 
noisy  data. 

One  such  technique  might  be  to  check  the  dependent  term  for  systematic  trends.  The  values  of 
y  in  Table  1  are  monotonically  increasing,  for  example,  which  suggests  a  higher  order  derivative 
should  be  calculated.  If  no  such  trends  were  found,  bacon.5  could  accept  the  current  relationship, 
even  though  the  number  of  outliers  was  large.  A  second  technique  would  be  to  allow  the  program  to 
store  several  possible  relationships  between  the  current  independent  and  dependent  terms.  Beam 
searching  techniques  could  be  used  to  limit  the  number  of  competing  hypotheses  bacon  entertained 
at  any  given  time,  and  the  program  could  design  critical  experiments  to  determine  the  best 
description  of  the  data.  Finally,  if  the  system  discovered  promising  relationships  in  parts  of  the  data, 
the  expectation-driven  heuristics  discussed  above  could  help  bacon  to  develop  a  consistent 
interpretation  of  the  data,  even  in  the  presence  of  substantial  noise.  Combining  these  techniques 
should  allow  bacon  to  deal  with  realistic  amounts  of  noise  in  data  in  a  robust  manner. 

7.  Discovering  Symmetrical  Laws 

The  assumption  of  symmetry  has  been  a  powerful  aid  in  the  discovery  of  physical  laws.  Table 
10  presents  three  well-known  laws  that  exhibit  symmetry.  Although  bacon.5  could  discover  these 
laws  without  any  heuristics  other  than  those  we  have  already  described,  the  inclusion  of  a  new 
component  that  postulates  symmetry  significantly  reduces  the  search  required  to  find  these  laws. 
This  new  heuristic  waits  until  all  the  terms  associated  with  an  object  have  been  related,  and  then 
assumes  that  the  same  relation  will  hold  for  a  second  set  of  terms  that  are  associated  with  an 
analogous  object.  The  resulting  complex  terms  are  then  combined  into  a  symmetrical  law. 


Snell's  law  of  refraction  sine^1/n1  =  sine02/n2 

Conservation  of  momentum  mi^vi  *  U,)  =  ’m2^V2  * 

Black  s  specific  heat  law  c1M1(T1  *  F1)  =  -c2M2(T2  -  F2) 

Table  10.  Symmetrical  laws  discovered  by  bacon.5. 

As  an  example,  consider  bacon.s’s  discovery  of  Snell’s  law  of  refraction,  as  summarized  in 
Table  11.  The  program  starts  with  two  objects  and  two  variables  associated  with  each  object  -  the 
medium  through  which  light  passes,  and  the  sine  of  the  angle  the  light  takes.  Varying  medium2 
and  holding  medium  1  and  sineO  f  constant,  the  system  postulates  an  intrinsic  property,  n2,  whose 
values  are  associated  with  different  media.  Of  course,  the  ratio  sine  6  «/n„  has  the  constant  value 
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1.0.  At  this  point,  bacon. 5  relates  the  terms  associated  with  the  second  object,  and  decides  that  it 
should  examine  the  values  of  sine  6  (/nf  and  relate  them  to  the  former  ratio.  Upon  gathering 
additional  data,  the  program  discovers  that  the  two  ratios  are  identical,  or  that  sine  01/n1  =  sine 
02/n2,  which  is  one  statement  of  Snell’s  law. 


MEDIUM1 

SIN  0^ 

MEDIUM,, 

SIN  O' 

N2 

StN  02/N2 

VACUUM 

0.25 

WATER 

0.33 

0.33 

1.0 

VACUUM 

0.25 

OIL 

0.37 

0.37 

1.0 

VACUUM 

0.25 

GLASS 

0.42 

0.42 

1.0 

Table  1 1 .  Discovering  Snell's  law  of  refraction. 

The  bacon.5  system  has  discovered  two  other  symmetrical  laws  -  conservation  of  momentum 
and  Black’s  specific  heat  law  -  following  very  similar  paths.  Table  10  presents  the  full  form  of  the 
laws;  directly  observable  terms  are  shown  in  upper  case,  while  intrinsic  properties  Eire  shown  in  lower 
case.  The  program  has  also  discovered  two  different  versions  of  Joule’s  law  of  energy  conservation, 
using  a  simple  form  of  reasoning  by  analogy.  This  strategy  states  that  if  the  same  set  of  terms  occurs 
in  more  than  one  experiment,  one  should  consider  combining  them  in  the  same  fashion  as  proved 
useful  before.  For  a  more  complete  description  of  this  heuristic  and  its  application  to  Joule’s  law,  the 
reader  is  directed  to  an  earlier  article  on  bacon  [10]. 

In  summary,  we  have  seen  that  bacon’s  expectation-driven  heuristics  -  expecting  similar 
relations,  reducing  the  data  that  is  gathered,  and  postulating  symmetrical  laws  -  allow  it  to  discover 
empirical  laws  with  considerable  reduction  in  search.  Actual  computational  savings  for  three 
symmetric  laws  are  shown  in  Table  12.  From  this  table,  it  can  be  seen  that,  when  combined, 
bacon.5's  expectation-driven  heuristics  result  in  major  savings.  Moreover,  these  heuristics 
accomplish  this  with  little  loss  in  generality,  since  relations  such  as  symmetry  can  be  found  in  a  wide 
variety  of  scientific  domains. 


DD 

DD+SR+DR 

DD*SR+DR»SY 

MOMENTUM 

515 

212 

8 

SNELL'S  LAW 

40 

40 

5 

BLACK’S  LAW 

8433 

2200 

23 

Table  1 2.  Time  to  discover  symmetric  laws  in  CPU  seconds.3 


3 

DD  =  data-driven  heuristics,  DD  +  SR  +  DR  =  data  driven,  similar  relation,  and  data  reduction  heuristics,  DO  +  SR  +  DR 
♦  SY  =  data-driven,  similar  relation,  data  reduction,  and  symmetry  heuristics. 
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8.  The  Importance  of  Structure 

In  the  previous  sections,  we  have  described  the  empirical  discovery  system  bacon.s.  Given  a 
set  of  numeric  or  nominal  variables,  this  system  employs  a  number  of  heuristics  to  determine  the 
relation  between  those  terms.  Yet  it  is  worth  noting  that  the  most  interesting  of  bacon’s  heuristics 
address  aspects  of  discovery  that  lie  beyond  the  simple  relation  of  variables.  For  example,  when  an 
intrinsic  property  is  postulated,  it  is  always  associated  with  some  object  or  class  of  objects.  Similarly, 
the  symmetry  and  analogy  heuristics  apply  only  in  situations  where  the  same  terms  are  associated 
with  different  objects.  (In- the  symmetry  case,  identical  terms  are  attached  to  different  objects  within 
an  experiment,  while  in  the  analogy  case  the  identity  falls  across  experiments.) 


In  summary,  these  heuristics  appear  to  incorporate  some  notion  of  structure  which  extends 
beyond  the  simple  variable-value  representation  used  in  bacon.s.  Given  this  view,  one  drawback  of 
bacon  is  that  it  represents  this  structure  implicitly  rather  than  explicitly.  Thus,  in  replicating 
Ohm’s  experiment,  the  program  is  told  about  the  battery,  the  length  of  the  wire,  and  the  current,  but  it 
does  net  understand  that  the  wire  must  be  connected  to  the  battery  to  generate  the  current.  Similarly, 
in  the  conservation  of  momentum  experiment,  bacon  is  given  variables  for  the  objects  along  with 
their  initial  and  final  velocities;  however,  it  is  unaware  that  the  initial  velocities  are  transformed  into 
the  final  velocities  by  a  collision,  and  that  if  no  collision  occurs,  the  velocities  will  remain  unchanged. 
In  other  words,  bacon.s  attempts  to  discover  quantitative  laws  before  it  has  mastered  the  qualitative 
laws  of  structure  [11 J.  This  feat  can  be  accomplished,  but  only  if  the  system  is  presented  with  a  set 
of  variables  that  have  been  carefully  selected  to  contain  those  qualitative  relationships. 


Future  versions  of  bacon  should  represent  structural  relations  explicitly,  and  should  attempt  to 
discover  the  qualitative  laws  of  a  situation  (e.g.,  that  objects  collide  and  change  direction,  or  that 
some  chemicals  combine  to  form  new  chemicals)  before  moving  on  to  considering  quantitative  laws. 
Such  an  approach  would  be  much  more  consistent  with  historical  developments  in  science  than  the 
current  implementation.  Moreover,  once  the  system  has  arrived  at  a  structural  model  for  a  situation, 
this  model  may  find  another  use  as  an  explanation  for  a  quantitative  law  found  in  some  other 
situation.  This  is  an  important  point,  since  many  explanatory  theories  -  including  the  atomic  theory, 
the  kinetic  theory  of  gasses,  and  the  germ  theory  of  disease  •  are  primarily  structural  models.  Thus, 
by  exploring  the  role  of  structure  in  a  descriptive  discovery  system  like  bacon,  we  may  come  to  a 
fuller  understanding  of  explanatory  science  as  well. 


For  instance,  consider  the  kinetic  theory  of  gasses,  which  can  be  used  to  explain  the  ideal  gas 
law.  Central  to  the  kinetic  theory  is  the  notion  of  colliding  molecules  that  obey  conservation  of 
momentum.  The  hypothesis  that  a  gas  is  composed  of  microscopic  objects  (similar  to  their 
macroscopic  counterparts  in  the  momentum  experiment)  provides  an  explanation  of  the  macroscopic 
relation  between  temperature,  volume,  and  pressure.  We  do  not  claim  to  fully  understand  the  process 
by  which  such  explanations  are  constructed,  though  some  form  of  reasoning  by  analogy  seems  a 
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likely  candidate.  In  any  case,  the  relation  between  qualitative  laws  of  structure  and  explanation  is  a 
promising  direction  for  future  research. 

It  is  interesting  to  note  that  one  of  sacon's  current  heuristics  -  searching  for  common  divisors 
-  could  play  an  important  role  in  such  an  explanatory  discovery  system.  This  results  from  the  fact 
that  the  existence  of  a  common  divisor  for  a  set  of  data  suggests  an  important  structural  aspect  of 
those  data,  namely  that  the  objects  involved  in  the  experiment  consist  of  quanta.  Thus,  one  can 
imagine  an  extended  version  of  bacon  that,  upon  finding  common  divisors  in  chemical  reactions, 
would  invoke  a  prototype  atomic  theory  to  explain  this  fact. 

Finally,  we  should  note  that  an  emphasis  on  qualitative  laws  of  structure  may  provide  a  new 
approach  to  the  dual  problems  of  noise  and  irrelevant  variables.  Given  an  understanding  of  the 
structure  of  some  situation,  it  may  be  possible  to  eliminate  some  relationships  and  some  variables 
even  before  any  quantitative  data  are  gathered.  For  example,  given  the  principle  "no  action  at  a 
distance"  and  an  experimental  context  in  which  two  objects  never  touch  or  even  approach  each 
other,  one  can  immediately  predict  that  the  variables  associated  with  these  objects  will  be  unrelated. 
Again,  this  is  an  area  in  which  our  ideas  remain  vague,  but  it  is  also  an  area  that  deserves  further 
attention. 

9.  Conclusions 

In  this  paper  we  have  described  bacon.5,  an  empirical  discovery  system  that  draws  on  data- 
driven  heuristics  for  finding  numeric  relations  between  two  variables,  recursing  to  higher  levels  of 
description,  postulating  intrinsic  properties,  and  finding  common  divisors.  In  addition  to  its  data- 
driven  techniques,  bacon  also  incorporates  expectation-driven  heuristics  for  expecting  similar 
relations,  reducing  the  amount  of  data  that  must  be  gathered,  assuming  symmetrical  laws,  and 
reasoning  by  a  simple  type  of  analogy.  These  latter  rules  take  advantage  of  discoveries  bacon  has 
made  itself  instead  of  drawing  on  knowledge  about  some  particular  domain.  Thus,  the  program 
retains  considerable  generality,  as  evidenced  by  the  broad  range  of  laws  it  has  been  able  to  discover. 
In  addition,  the  expectation-driven  methods  reduce  the  overall  search  that  bacon  must  perform  in 
discovering  a  law. 

We  have  also  seen  that  some  of  bacon’s  heuristics  incorporate  a  notion  of  structure,  but  that 
this  knowledge  is  represented  implicitly.  Future  versions  of  the  system  should  represent  structural 
information  explicitly,  and  attempt  to  discover  qualitative  laws  before  moving  on  to  quantitative  ones. 
This  approach  should  provide  new  methods  for  handling  noise  and  determining  relevant  variables, 
but  it  may  do  more  than  simply  improve  bacon's  techniques  for  discovering  descriptive  laws.  We 
hope  that  a  concern  with  qualitative  laws  of  structure  will  shed  light  on  the  process  of  explanatory 
discovery  as  well. 
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